JHU_GL_logo.png

Project 1 - DualLens Analytics

image.png

Background Story

In the rapidly evolving world of finance and technology, investors are constantly seeking ways to make smarter decisions by combining traditional financial analysis with emerging technological insights. While stock market trends provide a numerical perspective on growth, an organization’s initiatives in cutting-edge fields like Artificial Intelligence (AI) reveal its future readiness and innovation potential. However, analyzing both dimensions - quantitative financial performance and qualitative AI initiatives - requires sifting through multiple, diverse data sources: stock data from platforms like Yahoo Finance, reports in PDFs, and contextual reasoning using Large Language Models (LLMs).

This is where DualLens Analytics comes in. By applying a dual-lens approach, the project leverages Retrieval-Augmented Generation (RAG) to merge financial growth data with strategic insights from organizational reports. Stock data provides evidence of stability and momentum, while AI initiative documents reveal forward-looking innovation. Together, they form a richer, more holistic picture of organizational potential.

With DualLens Analytics, investors no longer need to choose between numbers and narratives—they gain a unified, AI-driven perspective that ranks organizations by both financial strength and innovation readiness, enabling smarter, future-focused investment strategies.

Problem Statement

Traditional investment analysis often focuses on financial metrics alone (e.g., stock growth, revenue, market cap), missing the qualitative dimension of how prepared a company is for the future. On the other hand, qualitative documents like strategy PDFs contain valuable insights about innovation and AI initiatives, but they are difficult to structure, query, and integrate with numeric financial data.

This leads to three core challenges:

  1. Fragmented Data Sources: Financial data (stock prices) and strategic insights (PDFs) exist in silos.

  2. Limited Analytical Scope: Manual analysis of growth trends and PDF reports is time-consuming and error-prone.

  3. Decisional Blind Spots: Without integrating both quantitative (growth trends) and qualitative (AI initiatives) signals, investors may miss out on high-potential organizations.

Solution Approach

To address this challenge, we set out to build a Retrieval-Augmented Generation (RAG) powered system that blends financial trends with AI-related strategic insights, helping investors rank organizations based on growth trajectory and innovation capacity.

image.png

NOTE

You need to look for "--- --- ---" and add your code over there, this is a placeholder.</font>

Setting up Installations and Imports

In [55]:
# @title Run this cell => Restart the session => Start executing the below cells **(DO NOT EXECUTE THIS CELL AGAIN)**

!pip install langchain==0.3.25 \
                langchain-core==0.3.65 \
                langchain-openai==0.3.24 \
                chromadb==0.6.3 \
                langchain-community==0.3.20 \
                pypdf==5.4.0
Requirement already satisfied: langchain==0.3.25 in /usr/local/lib/python3.12/dist-packages (0.3.25)
Requirement already satisfied: langchain-core==0.3.65 in /usr/local/lib/python3.12/dist-packages (0.3.65)
Requirement already satisfied: langchain-openai==0.3.24 in /usr/local/lib/python3.12/dist-packages (0.3.24)
Requirement already satisfied: chromadb==0.6.3 in /usr/local/lib/python3.12/dist-packages (0.6.3)
Requirement already satisfied: langchain-community==0.3.20 in /usr/local/lib/python3.12/dist-packages (0.3.20)
Requirement already satisfied: pypdf==5.4.0 in /usr/local/lib/python3.12/dist-packages (5.4.0)
Requirement already satisfied: langchain-text-splitters<1.0.0,>=0.3.8 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (0.3.8)
Requirement already satisfied: langsmith<0.4,>=0.1.17 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (0.3.45)
Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (2.12.3)
Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (2.0.45)
Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (2.32.4)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (6.0.3)
Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /usr/local/lib/python3.12/dist-packages (from langchain-core==0.3.65) (9.1.2)
Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.12/dist-packages (from langchain-core==0.3.65) (1.33)
Requirement already satisfied: packaging<25,>=23.2 in /usr/local/lib/python3.12/dist-packages (from langchain-core==0.3.65) (24.2)
Requirement already satisfied: typing-extensions>=4.7 in /usr/local/lib/python3.12/dist-packages (from langchain-core==0.3.65) (4.15.0)
Requirement already satisfied: openai<2.0.0,>=1.86.0 in /usr/local/lib/python3.12/dist-packages (from langchain-openai==0.3.24) (1.109.1)
Requirement already satisfied: tiktoken<1,>=0.7 in /usr/local/lib/python3.12/dist-packages (from langchain-openai==0.3.24) (0.12.0)
Requirement already satisfied: build>=1.0.3 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.3.0)
Requirement already satisfied: chroma-hnswlib==0.7.6 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.7.6)
Requirement already satisfied: fastapi>=0.95.2 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.123.10)
Requirement already satisfied: uvicorn>=0.18.3 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb==0.6.3) (0.38.0)
Requirement already satisfied: numpy>=1.22.5 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (2.0.2)
Requirement already satisfied: posthog>=2.4.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (7.4.0)
Requirement already satisfied: onnxruntime>=1.14.1 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.23.2)
Requirement already satisfied: opentelemetry-api>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.39.1)
Requirement already satisfied: opentelemetry-exporter-otlp-proto-grpc>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.39.1)
Requirement already satisfied: opentelemetry-instrumentation-fastapi>=0.41b0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.60b1)
Requirement already satisfied: opentelemetry-sdk>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.39.1)
Requirement already satisfied: tokenizers>=0.13.2 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.22.1)
Requirement already satisfied: pypika>=0.48.9 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.48.9)
Requirement already satisfied: tqdm>=4.65.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (4.67.1)
Requirement already satisfied: overrides>=7.3.1 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (7.7.0)
Requirement already satisfied: importlib-resources in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (6.5.2)
Requirement already satisfied: grpcio>=1.58.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.76.0)
Requirement already satisfied: bcrypt>=4.0.1 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (5.0.0)
Requirement already satisfied: typer>=0.9.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.20.0)
Requirement already satisfied: kubernetes>=28.1.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (34.1.0)
Requirement already satisfied: mmh3>=4.0.1 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (5.2.0)
Requirement already satisfied: orjson>=3.9.12 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (3.11.5)
Requirement already satisfied: httpx>=0.27.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.28.1)
Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (13.9.4)
Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.12/dist-packages (from langchain-community==0.3.20) (3.13.2)
Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /usr/local/lib/python3.12/dist-packages (from langchain-community==0.3.20) (0.6.7)
Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community==0.3.20) (2.12.0)
Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community==0.3.20) (0.4.3)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (2.6.1)
Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (1.4.0)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (25.4.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (1.8.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (6.7.0)
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (0.4.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (1.22.0)
Requirement already satisfied: pyproject_hooks in /usr/local/lib/python3.12/dist-packages (from build>=1.0.3->chromadb==0.6.3) (1.2.0)
Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /usr/local/lib/python3.12/dist-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community==0.3.20) (3.26.1)
Requirement already satisfied: typing-inspect<1,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community==0.3.20) (0.9.0)
Requirement already satisfied: starlette<0.51.0,>=0.40.0 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.95.2->chromadb==0.6.3) (0.50.0)
Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.95.2->chromadb==0.6.3) (0.0.4)
Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (4.12.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (2025.11.12)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (1.0.9)
Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (3.11)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx>=0.27.0->chromadb==0.6.3) (0.16.0)
Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.12/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core==0.3.65) (3.0.0)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (1.17.0)
Requirement already satisfied: python-dateutil>=2.5.3 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (2.9.0.post0)
Requirement already satisfied: google-auth>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (2.43.0)
Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (1.9.0)
Requirement already satisfied: requests-oauthlib in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (2.0.0)
Requirement already satisfied: urllib3<2.4.0,>=1.24.2 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (2.3.0)
Requirement already satisfied: durationpy>=0.7 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (0.10)
Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from langsmith<0.4,>=0.1.17->langchain==0.3.25) (1.0.0)
Requirement already satisfied: zstandard<0.24.0,>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from langsmith<0.4,>=0.1.17->langchain==0.3.25) (0.23.0)
Requirement already satisfied: coloredlogs in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb==0.6.3) (15.0.1)
Requirement already satisfied: flatbuffers in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb==0.6.3) (25.9.23)
Requirement already satisfied: protobuf in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb==0.6.3) (5.29.5)
Requirement already satisfied: sympy in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb==0.6.3) (1.14.0)
Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.12/dist-packages (from openai<2.0.0,>=1.86.0->langchain-openai==0.3.24) (1.9.0)
Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from openai<2.0.0,>=1.86.0->langchain-openai==0.3.24) (0.12.0)
Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from openai<2.0.0,>=1.86.0->langchain-openai==0.3.24) (1.3.1)
Requirement already satisfied: importlib-metadata<8.8.0,>=6.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-api>=1.2.0->chromadb==0.6.3) (8.7.0)
Requirement already satisfied: googleapis-common-protos~=1.57 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb==0.6.3) (1.72.0)
Requirement already satisfied: opentelemetry-exporter-otlp-proto-common==1.39.1 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb==0.6.3) (1.39.1)
Requirement already satisfied: opentelemetry-proto==1.39.1 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb==0.6.3) (1.39.1)
Requirement already satisfied: opentelemetry-instrumentation-asgi==0.60b1 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3) (0.60b1)
Requirement already satisfied: opentelemetry-instrumentation==0.60b1 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3) (0.60b1)
Requirement already satisfied: opentelemetry-semantic-conventions==0.60b1 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3) (0.60b1)
Requirement already satisfied: opentelemetry-util-http==0.60b1 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3) (0.60b1)
Requirement already satisfied: wrapt<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-instrumentation==0.60b1->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3) (1.17.3)
Requirement already satisfied: asgiref~=3.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-instrumentation-asgi==0.60b1->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3) (3.11.0)
Requirement already satisfied: backoff>=1.10.0 in /usr/local/lib/python3.12/dist-packages (from posthog>=2.4.0->chromadb==0.6.3) (2.2.1)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain==0.3.25) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain==0.3.25) (2.41.4)
Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain==0.3.25) (0.4.2)
Requirement already satisfied: python-dotenv>=0.21.0 in /usr/local/lib/python3.12/dist-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain-community==0.3.20) (1.2.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2->langchain==0.3.25) (3.4.4)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->chromadb==0.6.3) (4.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->chromadb==0.6.3) (2.19.2)
Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.12/dist-packages (from SQLAlchemy<3,>=1.4->langchain==0.3.25) (3.3.0)
Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.12/dist-packages (from tiktoken<1,>=0.7->langchain-openai==0.3.24) (2025.11.3)
Requirement already satisfied: huggingface-hub<2.0,>=0.16.4 in /usr/local/lib/python3.12/dist-packages (from tokenizers>=0.13.2->chromadb==0.6.3) (0.36.0)
Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from typer>=0.9.0->chromadb==0.6.3) (8.3.1)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer>=0.9.0->chromadb==0.6.3) (1.5.4)
Requirement already satisfied: httptools>=0.6.3 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb==0.6.3) (0.7.1)
Requirement already satisfied: uvloop>=0.15.1 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb==0.6.3) (0.22.1)
Requirement already satisfied: watchfiles>=0.13 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb==0.6.3) (1.1.1)
Requirement already satisfied: websockets>=10.4 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb==0.6.3) (15.0.1)
Requirement already satisfied: cachetools<7.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (6.2.4)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (0.4.2)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (4.9.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=0.16.4->tokenizers>=0.13.2->chromadb==0.6.3) (3.20.0)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=0.16.4->tokenizers>=0.13.2->chromadb==0.6.3) (2025.3.0)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=0.16.4->tokenizers>=0.13.2->chromadb==0.6.3) (1.2.0)
Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.12/dist-packages (from importlib-metadata<8.8.0,>=6.0->opentelemetry-api>=1.2.0->chromadb==0.6.3) (3.23.0)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->chromadb==0.6.3) (0.1.2)
Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community==0.3.20) (1.1.0)
Requirement already satisfied: humanfriendly>=9.1 in /usr/local/lib/python3.12/dist-packages (from coloredlogs->onnxruntime>=1.14.1->chromadb==0.6.3) (10.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.12/dist-packages (from requests-oauthlib->kubernetes>=28.1.0->chromadb==0.6.3) (3.3.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy->onnxruntime>=1.14.1->chromadb==0.6.3) (1.3.0)
Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.12/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (0.6.1)
In [56]:
import yfinance as yf              # Used for gathering stock prices
import matplotlib.pyplot as plt    # Used for Data Visualization / Plots / Graphs
import pandas as pd                # Helpful for working with tabular data like DataFrames
import os                          # Interacting with the operating system

from langchain.text_splitter import RecursiveCharacterTextSplitter      #  Helpful in splitting the PDF into smaller chunks
from langchain_community.document_loaders import PyPDFDirectoryLoader, PyPDFLoader     # Loading a PDF
from langchain_community.vectorstores import Chroma    # Vector DataBase

1. Organization Selection

Selecting the below five organizations as the analysis pool.

In [57]:
companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]

2. Setting up LLM - 1 Marks

  • The config.json file should contain API_KEY and API BASE URL provided by OpenAI.
  • You need to insert your actual API keys and endpoint URL obtained from your Olympus account. Refer to the OpenAI Access Token documentation for more information on how to generate and manage your API keys.
  • This code reads the config.json file and extracts the API details.
    • The API_KEY is a unique secret key that authorizes your requests to OpenAI's API.
    • The OPENAI_API_BASE is the API BASE URL where the model will process your requests.

What To Do?

  • Use the sample config.json file provided.
  • Add their OpenAI API Key and Base URL to the file.
  • The config.json should look like this:

    {
          "API_KEY": "your_openai_api_key_here",
          "OPENAI_API_BASE": "https://your_openai_api_base/v1"
        }
In [58]:
#Loading the `config.json` file
import json
import os

# Load the JSON file and extract values
file_name = "config.json"
with open(file_name, 'r') as file:
    config = json.load(file)
    os.environ['OPENAI_API_KEY'] = config["API_KEY"] # Loading the API Key
    os.environ["OPENAI_BASE_URL"] = config["OPENAI_API_BASE"] # Loading the API Base Url
In [59]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model="gpt-4o-mini",                      # "gpt-4o-mini" to be used as an LLM
    temperature=0,           # Set the temprature to 0
    max_tokens=5000,                 # Set the max_tokens = 5000, so that the long response will not be clipped off
    top_p=0.95,
    frequency_penalty=1.2,
    stop_sequences=['INST']
)

3. Visualization and Insight Extraction - 5 Marks

generated-image (2) (1).png

Gather stock data for the selected organization from the past three years using the YFinance library, and visualize this data for enhanced analysis.

**Your Task**

  1. Loop through each company to retrieve stock data of the last three years using the YFinance library.
  2. Plot the closing prices for each company.
In [60]:
plt.figure(figsize=(14,7))

# Loop through each company and plot closing prices
for symbol in companies:
    ticker = yf.Ticker(symbol)
    data = ticker.history(period="3y")

    # Plot closing price
    plt.plot(data.index, data['Close'], label=symbol)

plt.title("Stock Price Trends (Last 3 Years)")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.legend()
plt.grid(True)
plt.savefig("Stock_Price_Trends_3Y.png")
plt.show()

Financial Metrics

  1. Market Cap: Total market value of a company’s outstanding shares.
  2. P/E Ratio: Shows how much investors are willing to pay per dollar of earnings.
  3. Dividend Yield: Annual dividend income as a percentage of the stock price.
  4. Beta: Measures a stock’s volatility relative to the overall market.
  5. Total Revenue: The total income a company generates from its business operations.

**Your Task**

  1. Loop through all the companies to fetch data based on the specified financial metrics.
  2. Create a DataFrame (DF) from the collected data.
  3. Visualize and compare each financial metric across all companies.
  4. For example, visualize and compare the market capitalization for each company.

Tip: Check ticker.info for the available financial metrics

In [61]:
import pandas as pd
import matplotlib.pyplot as plt

companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN", "META"]
metrics_list = {}

# Fetching the financial metrics
for symbol in companies:                          # Loop through all the companies
    ticker = yf.Ticker(symbol)
    info = ticker.info
    metrics_list[symbol] = {                              # Define the dictionary of all the Finanical Metrics
        "Market Cap": info.get("marketCap", 0),
        "P/E Ratio": info.get("trailingPE", 0),
        "P/E Growth Ratio": info.get("trailingPegRatio", 0),
        "Total Revenue": info.get("totalRevenue", 0),
        "Return on Equity (ROE)": info.get("returnOnEquity", 0),
        "Free Cash Flow": info.get("freeCashflow", 0),
        "Price-to-Book (P/B) Ratio": info.get("priceToBook", 0),
        "Debt-to-Equity Ratio": info.get("debtToEquity", 0),
        "Dividend Yield": info.get("dividendYield", 0),
        "Beta": info.get("beta", 0)
    }
In [62]:
# Convert to DataFrame
df = pd.DataFrame(metrics_list).T

# Converting large numbers to billions for readability by divinding the whole column by 1e9
df["Market Cap"] = df["Market Cap"] / 1e9
df["Total Revenue"] = df["Total Revenue"] / 1e9
df["Free Cash Flow"] = df["Free Cash Flow"] / 1e9
df["Dividend Yield"] = df["Dividend Yield"] * 100  # Convert to percentage

df   # Printing the df
Out[62]:
Market Cap P/E Ratio P/E Growth Ratio Total Revenue Return on Equity (ROE) Free Cash Flow Price-to-Book (P/B) Ratio Debt-to-Equity Ratio Dividend Yield Beta
GOOGL 3720.359444 30.291912 1.6296 385.476002 0.35450 47.997751 9.588861 11.424 27.0 1.070
MSFT 3611.924365 34.609688 1.9865 293.812011 0.32241 53.327376 9.949223 33.154 75.0 1.070
IBM 281.336611 35.830956 2.0596 65.401999 0.30156 11.757500 10.082069 237.831 223.0 0.689
NVDA 4406.563570 44.799507 0.6921 187.141997 1.07359 53.282873 36.997140 9.102 2.0 2.284
AMZN 2430.420648 32.157000 1.5384 691.330023 0.24327 26.080000 6.573279 43.405 0.0 1.372
META 1660.448014 29.149115 1.5557 189.458006 0.32643 18.617750 8.557677 26.311 32.0 1.273
In [63]:
import matplotlib.pyplot as plt
import math

metrics_to_plot = df.columns.tolist()

colors = [
    "tab:blue", "tab:orange", "tab:green", "tab:red", "tab:purple",
    "tab:brown", "tab:pink", "tab:gray", "tab:olive", "tab:cyan"
]

n_cols = 3
n_rows = math.ceil(len(metrics_to_plot) / n_cols)

fig, axes = plt.subplots(
    n_rows,
    n_cols,
    figsize=(18, 4 * n_rows)   # 👈 key change
)

axes = axes.flatten()

for i, metric in enumerate(metrics_to_plot):
    ax = axes[i]
    ax.bar(df.index, df[metric], color=colors[i % len(colors)])
    ax.set_title(f"{metric} Comparison")
    ax.set_ylabel(metric)
    ax.set_xlabel("Company")
    ax.grid(axis='y')

# Remove unused subplots
for j in range(i + 1, len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout(pad=2.0)
plt.show()

4. RAG-Driven Analysis - 7 Marks

generated-image (1) (1).png

Performing the RAG-Driven Analysis on the AI Initiatives of the companies

**Your Task**

  1. Extract all PDF files from the provided ZIP file.
  2. Read the content from each PDF file.
  3. Split the content into manageable chunks.
  4. Store the chunks in a vector database using embedding functions.
  5. Implement a query mechanism on the vector database to retrieve results based on user queries regarding AI initiatives.
  6. Evaluate the LLM generated response using LLM-as-Judge

A. Loading Company AI Initiative Documents (PDFs) - 1 mark

In [64]:
# Unzipping the AI Initiatives Documents
import zipfile
with zipfile.ZipFile("/content/pdf_data/Companies-AI-Initiatives.zip", 'r') as zip_ref:
  zip_ref.extractall("/content/pdf_data")         # Storing all the unzipped contents in this location
In [65]:
# Path of all AI Initiative Documents
ai_initiative_pdf_paths = [f"/content/pdf_data/Companies-AI-Initiatives/{file}" for file in os.listdir("/content/pdf_data/Companies-AI-Initiatives")]
ai_initiative_pdf_paths
Out[65]:
['/content/pdf_data/Companies-AI-Initiatives/MSFT.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/IBM.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/NVDA.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/GOOGL.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/AMZN.pdf']
In [66]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader(path = "/content/pdf_data/Companies-AI-Initiatives/")          # Creating an PDF loader object
In [68]:
# Defining the text splitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=1000,
    chunk_overlap=200
)
In [69]:
# Splitting the chunks using the text splitter
ai_initiative_chunks = loader.load_and_split(text_splitter)
In [70]:
# Total length of all the chunks
len(ai_initiative_chunks)
Out[70]:
62

B. Vectorizing AI Initiative Documents with ChromaDB - 1 mark

In [71]:
# Defining the 'text-embedding-ada-002' as the embedding model
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")
In [72]:
#  Creating a Vectorstore, storing all the above created chunks using an embedding model
vectorstore = Chroma.from_documents(
    ai_initiative_chunks,
    embedding_model,
    collection_name="AI_Initiatives"
)

# Ignore if it gives an error or warning
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given

You can safely ignore this error. It is a known, harmless telemetry issue in Chroma and does NOT affect your vector store, embeddings, or retrieval.

Chroma tries to send anonymous usage telemetry

There’s a version mismatch between Chroma and its telemetry dependency (posthog)

The telemetry call fails

Your vectorstore is still created successfully

✅ Your data is stored ✅ Embeddings work ✅ Similarity search works

In [73]:
# Creating an retriever object which can fetch ten similar results from the vectorstore
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 10}
)

C. Retrieving relevant Documents - 3 marks

In [74]:
user_message = "Give me the best project that `IBM` company is working upon"
In [75]:
# Building the context for the query using the retrieved chunks
relevant_document_chunks = retriever.get_relevant_documents(user_message)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
In [76]:
len(relevant_document_chunks)
Out[76]:
10
In [77]:
# Write a system message for an LLM to help craft a response from the provided context
qna_system_message = """
You are an assistant whose work is to review the articles and provide the appropriate answers from the context.
User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer only using the context provided in the input. Do not mention anything about the context in your final answer.

If the answer is not found in the context, respond "I don't know".
"""
In [78]:
# Write an user message template which can be used to attach the context and the questions
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""
In [79]:
# Format the prompt
formatted_prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_message)}
                [/INST]"""
In [81]:
from IPython.display import display, Markdown
In [82]:
# Make the LLM call
resp = llm.invoke(formatted_prompt)
display(Markdown(resp.content))

IBM is currently focusing on the Granite project, which involves a series of open-source, high-performance AI foundation models designed to empower enterprise applications across various industries. The Granite models are efficient, customizable, and scalable, enabling businesses to integrate advanced AI capabilities into their workflows while maintaining control over their data and models.

In [83]:
# Define RAG function
def RAG(user_message):
    """
    Args:
    user_message: Takes a user input for which the response should be retrieved from the vectorDB.
    Returns:
    relevant context as per user query.
    """
    relevant_document_chunks = retriever.get_relevant_documents(user_message)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)



    # Combine qna_system_message and qna_user_message_template to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_message)}
                [/INST]"""

    # Quering the LLM
    try:
        response = llm.invoke(prompt)

    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response.content
In [84]:
# Test Cases
display(Markdown(RAG("How is the area in which GOOGL is working different from the area in which MSFT is working?")))
# print(RAG("How is the area in which GOOGL is working different from the area in which MSFT is working?"))

Google (GOOGL) specializes in search, advertising, cloud computing, hardware, and software services with a strong focus on artificial intelligence (AI) research and deployment through initiatives like Google Brain and DeepMind. Its AI ecosystem includes natural language processing, computer vision, speech recognition, and generative AI that power consumer products such as Google Search and Gmail.

In contrast, Microsoft (MSFT) focuses on software development, cloud computing through Azure AI solutions, and enterprise applications. It has expanded its AI capabilities via partnerships with OpenAI and proprietary models like Copilot for productivity enhancement in Microsoft 365 apps.

While both companies are involved in AI development across various domains including natural language processing and generative AI, Google's emphasis is more on integrating advanced multimodal foundation models into consumer products to enhance user experience. Microsoft's approach centers around embedding AI into enterprise solutions to streamline workflows within business applications.

In [85]:
# print(RAG("What are the three projects on which MSFT is working upon?"))
display(Markdown(RAG("What are the three projects on which MSFT is working upon?")))

Microsoft is working on the following three projects:

  1. Azure AI Foundry Labs: An experimental AI platform to accelerate the translation of advanced AI research into real-world applications, supporting experimentation with various AI technologies.

  2. Microsoft 365 Copilot: An AI-powered productivity assistant embedded across Microsoft 365 applications that enhances productivity by providing intelligent assistance in tasks such as drafting content and analyzing data.

  3. GitHub Copilot: An AI-driven coding support tool that improves developer productivity by offering advanced features for enterprises, alongside IntelliCode which provides lightweight assistance within the IDE.

In [86]:
# print(RAG("What is the timeline of each project in NVDA?"))
display(Markdown(RAG("What is the timeline of each project in NVDA?")))
  • Project G-Assist Timeline:

    • Concept & Demo Phase: Early prototypes were teased in NVIDIA showcases tied to RTX AI initiatives.
    • Public Availability: G-Assist became accessible via the NVIDIA App in 2024–2025, marking the first time consumers could interact with the assistant at scale.
    • Iterative Updates: Throughout 2024 and 2025, NVIDIA improved memory efficiency, broadened GPU compatibility, and launched plugin SDKs. Developer hackathons were also introduced during this period.
  • DLSS Timeline:

    • As of 2025, DLSS 4 is fully available and integrated into many new AAA titles. It is actively promoted in NVIDIA’s Game Ready drivers and continues to expand through partnerships with major publishers.
In [87]:
# print(RAG("What are the areas in which AMZN is investing when it comes to AI?"))
display(Markdown(RAG("What are the areas in which AMZN is investing when it comes to AI?")))

Amazon is investing in several areas related to AI, including:

  1. Retail: Enhancing product recommendations, dynamic pricing, fraud detection, and supply chain optimization.
  2. Amazon Web Services (AWS): Offering AI and machine learning tools for businesses to build intelligent applications.
  3. Voice Assistance: Innovations like Alexa that understand speech and perform tasks.
  4. Robotics: Streamlining order fulfillment in warehouses through robotics technology.
  5. Generative AI Applications: Developing platforms like Amazon Bedrock for democratizing access to generative AI technologies and simplifying the development of AI applications.
  6. Multimodal AI Models: Initiatives like Olympus that process text, images, and videos simultaneously to enhance search functionalities.

These investments aim to make services smarter, more efficient, convenient for customers and businesses alike while strengthening Amazon's position in the competitive AI landscape.

In [88]:
# print(RAG("What are the risks associated with projects within GOOG?"))
display(Markdown(RAG("What are the risks associated with projects within GOOG?")))

Several risks accompany the development of projects within Google, including:

  • Privacy Concerns: Processing live video and audio data raises significant privacy issues, necessitating robust data protection measures.
  • Technical Hurdles: Achieving real-time, accurate multimodal understanding requires overcoming complex AI and hardware challenges.
  • User Acceptance: Gaining user trust and acceptance for a new form of AI assistant that interacts in more personal and potentially intrusive ways.
  • Regulatory Compliance: Navigating the evolving landscape of AI regulations and ensuring compliance with global standards.
  • Model Safety: Hallucinations and factual inaccuracies remain a risk, requiring constant evaluation and moderation.
  • Regulatory Scrutiny: Integrations could attract antitrust scrutiny related to Chrome and Workspace products.
  • Compute Costs: High-performing models require significant energy and infrastructure, increasing operational costs.
  • Competition: Maintaining differentiation amid competitors like OpenAI, Meta, Anthropic, and emerging open models.

D. Evaluation of the RAG - 2 marks

In [89]:
# Writing a question for performing evaluations on the RAG
evaluation_test_question = "What are the three projects on which MSFT is working upon?"
In [90]:
# Building the context for the evaluation test question using the retrieved chunks
relevant_document_chunks = retriever.get_relevant_documents(evaluation_test_question)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
In [91]:
# Default RAG Answer
answer = RAG(evaluation_test_question)
# print(answer)
display(Markdown(answer))

Microsoft is working on the following three projects:

  1. Azure AI Foundry Labs: An experimental AI platform to accelerate the translation of advanced AI research into real-world applications, supporting experimentation with various AI technologies.

  2. Microsoft 365 Copilot: An AI-powered productivity assistant embedded across Microsoft 365 applications that enhances productivity by providing intelligent assistance in tasks such as drafting content and analyzing data.

  3. GitHub Copilot: An AI-driven coding support tool that improves developer productivity by offering advanced features for enterprises, alongside IntelliCode which provides lightweight assistance within the IDE.

In [92]:
# Defining user messsage template for evaluation
evaluation_user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""
1. Groundedness
In [93]:
# Writing the system message and the evaluation metrics for checking the groundedness
groundedness_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""
In [94]:
# Combining groundedness_rater_system_message + llm_prompt + answer for evaluation
groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
            {'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
            [/INST]"""
In [95]:
# Defining a new LLM object
groundness_checker = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=500,
    top_p=0.95,
    frequency_penalty=1.2,
    stop_sequences=['INST']
)

# Using the LLM-as-Judge for evaluating Groundedness
groundness_response = groundness_checker.invoke(groundedness_prompt)
# print(groundness_response.content)
display(Markdown(groundness_response.content))

Steps to Evaluate the Answer

  1. Identify Key Information in the Context: Extract the main projects mentioned in the context that Microsoft is working on.
  2. Compare with AI Generated Answer: Check if all three projects listed in the answer are present and accurately described based on information from the context.
  3. Assess Completeness and Accuracy: Determine if any additional information not found in the context has been included or if any critical details have been omitted.
  4. Rate Adherence to Metric: Based on how well the answer aligns with only using information from the provided context, assign a score according to evaluation criteria.

Step-by-Step Explanation of Adherence

  1. The question asks for "the three projects" Microsoft is working on, which implies a need for specificity and clarity regarding those projects.
  2. The provided context mentions:
    • Azure AI Foundry Labs
    • Microsoft 365 Copilot
    • GitHub Copilot (alongside IntelliCode)
  3. The AI-generated answer lists these three initiatives clearly:
    • It describes Azure AI Foundry Labs as an experimental platform supporting various AI technologies, which matches what was presented in detail within the context.
    • It accurately summarizes Microsoft 365 Copilot's role as an assistant across applications like Word and Excel, reflecting its purpose of enhancing productivity through intelligent assistance.
    • GitHub Copilot is correctly identified as a coding support tool aimed at improving developer productivity; it also mentions IntelliCode appropriately without deviating from what was stated about their relationship.

The descriptions align closely with those found within the original text, ensuring that no extraneous or incorrect information has been introduced.

Evaluation of Metric Adherence

  • The answer strictly derives its content from what was presented in the given context without introducing unrelated facts or interpretations beyond what's specified about each project.

Rating

Given that all elements required by both question and metric are met completely:

Score = 5 (The metric is followed completely)

2. Relevance
In [96]:
# Writing the system message and the evaluation metrics for checking the relevance
relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.
"""
In [97]:
# Combining relevance_rater_system_message + llm_prompt + answer for evaluation
relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
            {'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
            [/INST]"""
In [99]:
# Defining a new LLM object
relevance_checker = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=500,
    top_p=0.95,
    frequency_penalty=1.2,
    stop_sequences=['INST']
)

# Using the LLM-as-Judge for evaluating Relevance
relevance_response = relevance_checker.invoke(relevance_prompt)
# print(relevance_response.content)
display(Markdown(relevance_response.content))

Steps to Evaluate the Context as per the Metric:

  1. Identify Key Aspects of the Question: Determine what specific information is being asked in the question.
  2. Analyze Context for Relevant Information: Review the context provided to see if it contains information that directly answers all parts of the question.
  3. Check for Completeness and Exclusivity: Ensure that all important aspects are included in the answer and that irrelevant details are excluded.
  4. Assess Clarity and Conciseness: Evaluate whether the answer is clear, concise, and directly addresses each aspect of the question without unnecessary elaboration.

Step-by-Step Explanation:

  1. The question asks specifically about "the three projects on which MSFT is working upon." This indicates a need for a straightforward list or description of three distinct projects.
  2. The context provides detailed descriptions of several initiatives by Microsoft, including Azure AI Foundry Labs, Microsoft 365 Copilot, GitHub Copilot (and IntelliCode). Each project mentioned has relevant details about its purpose and functionality.
  3. The AI-generated answer lists exactly three projects—Azure AI Foundry Labs, Microsoft 365 Copilot, and GitHub Copilot—along with brief descriptions that capture their essence without extraneous detail.
  4. All key aspects from both context and question are addressed clearly; there’s no irrelevant information included.

Evaluation:

  • The answer includes all three projects explicitly requested in a clear manner while summarizing their functions effectively based on provided context.
  • It does not include any superfluous information outside what was necessary to address each project succinctly.

Rating:

The metric is followed completely as all relevant aspects were addressed appropriately without any omissions or unnecessary additions.

Score: 5

5. Scoring and Ranking - 3 Marks

image.png

Prompting an LLM to score each company by integrating Quantitative data (stock trend, growth metrics) and Qualitative evidence (PDF insights)

**Your Task**

  1. Write a system message and a user message that outlines the required data for the prompt.
  2. Prompt the LLM to rank and recommend companies for investment based on the provided PDF and stock data to achieve better returns.
In [100]:
# Fetching all the links of the documents
len(vectorstore.get()['documents'])
Out[100]:
124
In [101]:
# Write a system message for instructing the LLM for scoring and ranking the companies
system_message = """
You are a financial analyst assistant. Your task is to evaluate and rank a list of companies for investment potential.

You will be provided with two types of data:
1. Quantitative financial growth metrics such as market capitalization, P/E ratio, P/E Growth Ratio, Total Revenue, Return on Equity (ROE), Free Cash Flow, Price-to-Book (P/B) Ratio, Debt-to-Equity Ratio, Dividend Yield, and beta.
2. Qualitative strategic insights extracted from organizational reports (AI initiatives and other strategic information).

Your goal is to analyze both the quantitative data and qualitative insights to score each company on overall growth potential, innovation, risk, and strategic positioning.

You need to rank the companies from most to least recommended for investment, providing a brief explanation of your reasoning behind the top 3 picks.

Be clear, concise, and justify your rankings based on both data types.
"""
In [102]:
# Write a user message for instructing the LLM for scoring and ranking the companies
user_message = f"""
You are given:

---
### 1. Financial Data (Quantitative)
{df.to_string()}

---
### 2. Strategic Insights (Qualitative)
{vectorstore.get()['documents']}

---

Please score and rank these companies from best to worst investment opportunities. Consider financial growth metrics *and* the qualitative strategic insights.

Provide:
- A ranked list of companies
- Scores or ratings for each company
- Key reasons supporting the rankings, emphasizing strengths and risks

Your evaluation should help an investor decide where to allocate capital for better returns.
The recommendation.content should be clear and concise so that it is nicely formatted in google colab.
"""
In [103]:
# Formatting the prompt
formatted_prompt = f"""[INST]{system_message}\n
                {'user'}: {user_message}
                [/INST]"""
In [104]:
# Calling the LLM
recommendation = llm.invoke(formatted_prompt)
# recommendation.content
display(Markdown(recommendation.content))

Investment Ranking of Companies

Ranked List of Companies

  1. Microsoft (MSFT)
  2. NVIDIA (NVDA)
  3. Google (GOOGL)
  4. Amazon (AMZN)
  5. IBM

Scores and Ratings

Company Score Rating
Microsoft 9.5 Excellent
NVIDIA 9.0 Very Good
Google 8.5 Good
Amazon 8.0 Good
IBM 7.0 Fair

Key Reasons Supporting Rankings

1. Microsoft (MSFT)

  • Score: 9.5
  • Strengths:
    • Strong financial metrics with a solid P/E ratio and high revenue growth.
    • Significant investments in AI through Azure AI Foundry Labs, enhancing productivity across its software suite.
    • Strategic partnerships with OpenAI bolster its innovation capabilities.
  • Risks:
    • High competition in the cloud and AI sectors may pressure margins.

2. NVIDIA (NVDA)

  • Score: 9.0
  • Strengths:
    • Leading position in GPU technology essential for AI applications, driving demand across various industries.
    • Innovative projects like DLSS showcase their commitment to enhancing gaming experiences through AI.
    • Strong financial performance with high revenue growth rates and robust market capitalization.
  • Risks:
    • Dependence on the gaming sector could expose it to cyclical downturns.

3. Google (GOOGL)

  • Score: 8.5
  • Strengths:
    • Extensive investment in advanced multimodal models like Gemini, integrating cutting-edge technology into consumer products such as Search and Workspace.
    • Strong focus on responsible AI development aligns well with regulatory trends, ensuring long-term sustainability.
  • Risks:
      * Regulatory scrutiny over data privacy could impact operations.

4.Amazon(AMZN)

Score :8

Strengths:

  • Comprehensive use of AI across retail operations enhances customer experience significantly; initiatives like SageMaker democratize access to machine learning tools for businesses of all sizes; strong revenue generation from AWS services bolsters overall performance; ongoing innovations such as Bedrock enhance generative capabilities further strengthening market position; #### Risks:
  • Intense competition from other tech giants can affect pricing strategies; potential challenges related to data privacy must be managed effectively;

IBM

Score :7

Strengths:

  • Established presence in enterprise solutions leveraging platforms like Watsonx demonstrates commitment towards advancing enterprise-grade solutions while focusing on ethical usage standards;
  • Ongoing investments signal dedication towards maintaining relevance within competitive landscape; #### Risks:
  • Slower adoption rates compared to competitors due largely due lackluster marketing efforts or product visibility issues;

This evaluation provides a clear perspective for investors looking at these companies based on both quantitative metrics and qualitative insights into their strategic positioning within the industry landscape!

6. Summary and Recommendation - 4 Marks

Based on the project, learners are expected to share their observations, key learnings, and insights related to the business use case, including any challenges they encountered. Additionally, they should recommend improvements to the project and suggest further steps for enhancement.

A. Summary / Your Observations about this Project - 2 Marks

  1. The project effectively combines financial growth metrics with strategic insights from organizational reports using a RAG-based DualLens approach.

  2. Investment rankings align with market leaders (Microsoft, NVIDIA, Google), indicating accurate retrieval and meaningful synthesis by the LLM.

  3. The approach improves explainability, as recommendations are supported by both quantitative data and qualitative strategy analysis.

B. Recommendations for this Project / What improvements can be made to this Project - 2 Marks

  1. Introduce a structured, weighted scoring framework to improve consistency and reduce subjectivity in LLM-generated scores.

  2. Add time-aware retrieval and risk/confidence indicators to avoid outdated strategic insights and improve reliability.

  3. Validate the system through backtesting and portfolio-level analysis to measure real-world investment performance.

  4. Can be improved by adding more data points from yahoo library for functional analysis.

  5. Can use two seperate recommendations for short and long term investments.